log lr hyperparams; add exact match rewards; fix qwen3-base configs; use user/system parts in prompts by andytwigg · Pull Request #3417 · AI-Hypercomputer/maxtext

andytwigg · 2026-03-14T03:06:45Z

Description

this PR includes several changes:
qwen3-base:

add missing qwen3-base configs

logging:

inject LR as hyperparam to let tunix log LR

rewards:

fix bug to use reward_exact_answer in check_numbers instead of hardcoded 1.5
add simple check_answer_simple_math reward fn
add reward_exact_answer reward (to separate from exact_format_match)
add _make_reward_fn wrapper
comment out check_answer due to overlap with check_numbers

lora:

WIP on lora (enabled=False by default)

data:

modify process_data to generate separate user/system parts in prompts

Tests

this setup has been used to show gsm8k and openmath-instruct training on qwen3-{1-8}B

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

…axtext into atwigg/add_qwen3_base

…t_answer weight

codecov · 2026-03-14T03:35:02Z

Codecov Report

❌ Patch coverage is 10.16949% with 53 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/trainers/post_train/rl/utils_rl.py	5.71%	33 Missing ⚠️
src/maxtext/trainers/post_train/rl/train_rl.py	11.11%	16 Missing ⚠️
src/maxtext/integration/tunix/tunix_adapter.py	20.00%	4 Missing ⚠️

📢 Thoughts on this report? Let us know!

richjames0

Thanks Andy. This also includes a bunch of configs so maybe just update the PR title to reflect that

… __name__, add base models to globals.py

andytwigg and others added 8 commits March 14, 2026 00:26

add qwen3-base variants and qwen3-1.7b

20a4262

add qwen3-base models to checkpoint_conversion script

5df17ce

add qwen3-base to configs/types and checkpoint_conversion/param_mapping

1ad2297

add qwen3-base configs to checkpoint_conversion/hf_model_configs

c5b7e56

Merge branch 'main' into atwigg/add_qwen3_base

b15c94e

pyink

34b7431

Merge branch 'atwigg/add_qwen3_base' of github.com:AI-Hypercomputer/m…

ef6e767

…axtext into atwigg/add_qwen3_base

add LR schedule as hyperparam to get logged by tunix; add reward_exac…

391dabc

…t_answer weight

use reward_exact_answer in check_numbers

1e72e25

richjames0 requested changes Mar 14, 2026

View reviewed changes

andytwigg added 2 commits March 16, 2026 18:30

adding lora (WIP), add simplemath reward, let tunix see the reward_fn…

2482121

… __name__, add base models to globals.py

add _make_reward_fn wrapper

0fad512

andytwigg changed the title ~~log lr hyperparams; add exact match reward~~ log lr hyperparams; add exact match rewards; fix qwen3-base configs; use user/system parts in prompts Mar 16, 2026

andytwigg requested a review from richjames0 March 16, 2026 18:55

andytwigg added 2 commits March 16, 2026 18:57

pyink

7e980c6

update rl_utils_test to use values in config

084c8ee

andytwigg closed this Mar 17, 2026

andytwigg deleted the atwigg/log_lr_hyperparam branch March 17, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log lr hyperparams; add exact match rewards; fix qwen3-base configs; use user/system parts in prompts#3417

log lr hyperparams; add exact match rewards; fix qwen3-base configs; use user/system parts in prompts#3417
andytwigg wants to merge 13 commits intomainfrom
atwigg/log_lr_hyperparam

andytwigg commented Mar 14, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

richjames0 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

andytwigg commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

richjames0 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andytwigg commented Mar 14, 2026 •

edited

Loading

codecov Bot commented Mar 14, 2026 •

edited

Loading